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Abstract. Given a multiset X = {xi, . . . ,x n } of real numbers, the floating-point set summation 
problem asks for S n = x\ + • • • + x n - Let B* denote the minimum worst-case error over all possible 
orderings of evaluating S n - We prove that if X has both positive and negative numbers, it is NP-hard 
to compute S n with the worst-case error equal to E* . We then give the first known polynomial-time 
approximation algorithm that has a provably small error for arbitrary X. Our algorithm incurs a 
worst-case error at most 2([log(n — 1)] + l)i?*.PJ After X is sorted, it runs in 0(n) time. For the 
case where X is either all positive or all negative, we give another approximation algorithm with 
a worst-case error at most flog log n~\ E* . Even for unsorted X, this algorithm runs in 0(n) time. 
Previously, the best linear-time approximation algorithm had a worst-case error at most [log n] E* , 
while E* was known to be attainable in 0(n log n) time using Huffman coding. 
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1. Introduction. Summation of floating-point numbers is ubiquitous in numer- 
ical analysis and has been extensively studied (for example, see ||, |[ ||, [j], [|, [n], [Io[ ||, 
|l2|). This paper focuses on the floating-point set summation problem which, given a 
multiset X = {x%, . . . , x n } of real numbers, asks for S n = x\ + xi + • • • + x n . Without 
loss of generality, let Xi ^ for all i throughout the paper. Here X may contain both 
positive and negative numbers. For such a general X, previous studies have discussed 
heuristic methods and obtained statistical or empirical bounds for their errors. We 
take a new approach by designing efficient algorithms whose worst-case errors are 
provably small. 

Our error analysis uses the standard model of floating-point arithmetic with unit 
roundoff a<l: 

fl(x + y) = (x + y)(l + 8 xy ), where \5 xy \ < a. 

Since operator + is applied to two operands at a time, an ordering for adding X 
corresponds to a binary addition tree of n leaves and n — 1 internal nodes, where a 
leaf is an Xi and an internal node is the sum of its two children. Different orderings 
yield different addition trees, which may produce different computed sums S n in 
floating-point arithmetic. We aim to find an optimal ordering that minimizes the 
error E n = \S n — S n \. Let I±, . . . , I n -i be the internal nodes of an addition tree T 
over X . Since a is very small even on a desktop computer, any product of more than 
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one a is negligible in our consideration. Using this approximation, 

n-l 

S n ~ S n + ^ ' ij^j- 

i=l 

Hence, J5 n w | ^li^Ti — a X}ti l-^il) gi ym g r i se to the following definitions: 

• The worst-case error of T, denoted by E(T), is a^™ =1 

• The cost of T, denoted by C(T), is X)™^ 1 l J il- 

Our task is to find a fast algorithm that constructs an addition tree T over X such that 
E(T) is small. Since E(T) — a-C(T), minimizing E(T) is equivalent to minimizing 
C(T). We further adopt the following notations: 

• E* (respectively, C*) is the minimum worst-case error (respectively, minimum 
cost) over all orderings of evaluating S n . 

• T m j n denotes an optimal addition tree over X, i.e., E(T m i n ) = E* or equiva- 
lently C(T min ) = C*. 

In §^|, we prove that if X contains both positive and negative numbers, it is NP-hard 
to compute a T m i n . In light of this result, we design an approximation algorithm 



in |y] that computes a tree T with E(T) < 2(flog(n - 1)] + 1)E*. After X is 
sorted, this algorithm takes only 0(n) time. This is the first known polynomial-time 
approximation algorithm that has a provably small error for arbitrary X. For the 
case where X is either all positive or all negative, we give another approximation 



algorithm in §^2| that computes a tree T with E{T) < (1 + [log log n])E*. This 
algorithm takes only 0{n) time even for unsorted X. Previously ||, the best linear- 
time approximation algorithm had a worst-case error at most [log n~\E*, while E* 
was known to be attainable in O(nlogn) time using Huffman coding |9). 

2. Minimizing the worst-case error is NP-hard. If X contains both positive 
and negative numbers, we prove that it is NP-hard to find a T m i n . We first observe 
the following properties of T m in- 

Lemma 2.1. Let z be an internal node in T m i n with children z\ and Z2, sibling u, 
and parent r. 

1. If z > 0, Z\ > 0, and Z2 > 0, then u > or r < 0. 

2. If z < 0, zi < 0. and z 2 < 0, then u < or r > 0. 

Proof. By symmetry, we only prove the first statement. C(T r ) = \r\ + \z\ + Cf, 
where C f = C{T Zl ) + C{T Z2 ) + C(T U ). Assume to the contrary that u < and r > 0. 
Then z > \u\. We swap T Zl with T u . Let z' = u + Zi- Now r becomes the parent of z' 
and z\. This rearrangement of nodes does not affect the value of node r, and the costs 
of T Zl , T Z2 , and T u remain unchanged. Let T' r be the new subtree with root r. Let T" 
be the entire new tree resulted from the swapping. Since u and z 2 have the opposite 
signs, \z'\ < max{|u|,z 2 } < z. Hence, C(T^) = r + \z'\ +Cf <r + z + Cf = C(T r ). 
Thus, C(T') < C(T m in), contradicting the optimality of T m i n . This completes the 
proof. □ 

For the purpose of proving that finding a T m i n is NP-hard, we restrict all Xi to 
nonzero integers and consider the following optimization problem. 
MINIMUM ADDITION TREE (MAT) 
Input: A multiset X of n nonzero integers x\ 1 . . . , x n . 
Output: Some T m ; n over X. 

The following problem is a decision version of MAT. 
ADDITION TREE (AT) 

Instance: A multiset X of n nonzero integers x\, . . . , x n , and an integer k > 0. 
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Question: Does there exist an addition tree T over X with C(T) < k? 
Lemma 2.2. //MAT is solvable in time polynomial in n, then AT is also solvable 
in time polynomial in n. 



Proof. Straightforward. □ In light of Lemma 2.2, to prove that MAT is NP-hard, 
it suffices to reduce the following NP-complete problem Q to AT. 
3-PARTITION (3PAR) 

Instance: A multiset B of 3m positive integers b\, . . . , bz m , and a positive integer 
K such that K/4 < h < K/2 and b± H h b 3m = mK. 

Question: Can B be partitioned into m disjoint sets B±, . . . , B m such that for 
each Bi, ^2 beB . b = Kl (Bi must therefore contain exactly three elements from B.) 

Given an instance (B, K) of 3PAR, let 

W = lQQ(bm) 2 K; ai = b, + W; A = {a u a 3m }; L = 3W + K. 

Lemma 2.3. (A, L) is an instance o/3PAR. Furthermore, it is a positive instance 
if and only if (B, K) also is. 

Proof. Since K/4 < bi < K/2, K/4 + W < a t < K/2 + W and thus L/4 < a, < 
L/2. Next, ai + 02 + ■ ■ ■ + a^m = 3mW + mK = mL. This complete the proof of the 
first statement. The second statement follows from the fact that bi + bj + bk = K if 
and only if ai + aj + a^ — L. □ 

Write 

' h = |4e£j ; H = L + h; 



400(5m) 



h = f3 H; cii = ^~ + Pi^jH; a* = + e^jL; a M = max{ai : i — I, . . . , 3m}. 

Lemma 2.4. 

1. \ei\ < e for i = 1, . . . , 3m. 

2. < fa < 4e, and \f3 t \ < 4e for i = 1, . . . , 3m. 

3. 3a M < H. 
Proof. 

Statement 1. Note that ^ + W = (1/3 + €i){3W + K). Thus, b, = K/3 + 
e i (300(5m) 2 + l) J ft:. Since K/4 < b, t < K/2, -1/12 < £l (300(5m) 2 + 1) < 1/6. Hence, 
4(5m) 2 | £l | < 10- 2 , i.e., |e<| < e. 

Statement 2. Since AeL > 1, we have (3q > 0. Also, since H > L and (3qH — \4eL\ , 
we have (3o < 4e. Next, for each a%, we have Pi = [e%L — h/3)/(L + h). Then by the 
triangular inequality and Statement 1, \Pi\ < 7e/3 < 4e. 

Statement 3. By Statement 1, < (1/3 + e)L. Thus 3om < L + 3eL. Then, 
since 3tL < 3K < h, 3a M < L + h = H. □ 

To reduce (A, L) to an instance of AT, we consider a particular multiset 

X = A U {—H, -H} U{h,...,h} 

with m copies of —H and h each. Given a node s in T m j n , let T s denote the subtree 
rooted at s. For convenience, also let s denote the value of node s. Let v(T m i n ) denote 
the value of the root of T m j n , which is always 0. For brevity, we use A with or without 
scripts to denote the sum of at most 5m numbers in the form of ±/3;. Then all nodes 



are in the form of (N/3 + X)H for some integer N and some A. Since by Lemma 2.4, 



|A| < (5m)(4e) = (500m) , the terms TV and A of each node are uniquely determined. 
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The nodes in the form of XH are called the type-0 nodes. Note that T min has m type-0 
leaves, i.e., the m copies of h in X . 

Lemma 2.5. In T m i n , type-0 nodes can only be added to type-0 nodes. 

Proof. Assume to the contrary that a type-0 node z\ is added to a node zi in the 
form of (±A/3 + X)H with N > 1. Then \zi + z%\ > (1/3 + X')H for some A'. Let z 
be the parent of z\ and Zi. Since v(T m { n ) = 0, z cannot be the root of T m ; n . Let u be 
the sibling of z. Let r be the parent of z and u. Let t be the root of T m in- Let P r be 
the path from t to r in T min . Let m r be the number of nodes on P r . Since T min has 
5m — 1 internal nodes, m r < 5m — 1. 

We rearrange T m j n to obtain a new tree T" as follows. First, we replace T z with 
T Z2 ; i.e., r now has subtrees T Z2 and T u . Let T" be the remaining tree; i.e., T" 
is T m i n after removing T Zl . Next, we create T' such that its root has subtrees T Zl 
and T" . This tree rearrangement eliminates the cost \zi + z%\ from T r but may 
result in a new cost in the form of XH on each node of P r . The total of these 
extra costs, denoted by C\, is at most m r (5m)(4e)i/ < (5m — l)(5m)(4e)if . Then, 
C(T') = C(T min )~|z 1 + z 2 |+C A < C(T min )- (1/3 + A')ff + C A < C(T mln ) + (-1/3 + 
(5m) 2 (4e))# = C*(T min ) + (-1/3 + l(T 2 )iJ < C(T min ), contradicting the optimality 
of T min . This completes the proof. □ 

Lemma 2.6. Let z be a node in T m i n . 

1. If z < 0, then \z\ < H. 

2. If z > 0, then z < H . 
Proof. 

Statement 1. Assume that the statement is untrue. Then, since all negative leaves 
have values —H, some negative internal node z has an absolute value greater than H 
and two negative children Z\ and z%. Since v(T m i n ) = 0, some z has a positive sibling 
u. We pick such a z at the lowest possible level of T min . Let r be the parent of z 
and u. By Lemma 2.1 (3), r > 0. Then it > \z\ > H. Since all positive leaves have 
values less than H, u is an internal node with two children U\ and U2- Since u > 0, 
z < 0, and r > 0, by Lemma 2.1(1]), u must have a positive child and a negative child. 
Without loss of generality, let u\ be positive and ui be negative. Then u = u\ — \u2\- 
Since z is at the lowest possible level, \u<z\ < H, for otherwise we could find a z at a 
lower level under U2- We swap T z with T„ 2 . Let T' r be the new subtree rooted r. Let 
u' = tii + z - Since U2 + u' = r > and U2 < 0, we have u' > 0. Since \u2\ < H < \z\, 
we have u' = m - \z\ < u x - \u 2 \ = u. Let C f = C(T Z ) + C{T Ul ) + C{T U2 ). Then, 
C(T' r ) =r + u' + Cf <r + u+Cf = C(T r ), which contradicts the optimality of T min 
because the costs of the internal nodes not mentioned above remain unchanged. 

Statement 2. Assume that this statement is false. Then, since all positive leaves 
have values less than H, some internal node z has a value at least H as well as 
two positive children. Since v(T m i n ) = 0, some such z has a negative sibling u. By 
Statement 1, \u\ < H. Hence z + u > 0, contradicting Lemma [2.l| ([T|). □ 

The following lemma strengthens Lemma 2.6. 

Lemma 2.7. 

1. Let z be a node in T m - ln . If z > 0, then z is in the form of XH, (1/3 + X)H, 
or(2/3 + X)H. 

2. Let z be an internal node in T m ; n . If z < 0, then z is in the form of XH , 
(-1/3 + X)H, or (-2/3 + X)H. 

Proof. 



Statement 1. By Lemma 2.6, z < H. Thus, z = (N/3 + X)H with < N < 3. 



To rule out N = 3 by contradiction, assume z = (1 + X)H with A < 0. Since by 
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Lemma 2.4 all positiv e le aves have values less than (l/3 + 4e)if, z is an internal node. 

ildren z x = (2/3 + A') and z 2 = (1/3 + A"), 
is not the root and by Lemmas [2.5| and 2.6, z has a negative 
Let r be the parent of z and u. Then C(T r ) = 



and p. 6 

0, 



We swap T Z2 with T u . Let z be the parent of z\ 



By Lemmas 
Since v(T m i a ) = u, z 
sibling u. By Lemma |2.6| , < i? 
|r|+^ + C(r zi ) + C(T, 2 ) + C(T u ). 
and u. Now r is the parent of z' and u. Let T' r be the new subtree rooted at r after 
the swapping. Since r remains the same, C(T^) = \r\ + \z'\ + C(T Zl ) + C{T Z2 ) + C(T U ). 
If \u\ > zi, then \z'\ = \u\ — z\ < if — Z\ = (1/3 — A')i? < zi < z; otherwise, |u| < zi 
and thus |z'| = Z\ — \u\ < z\ < z. In either case, C(T{.) < C(T r ), contradicting the 
optimality of T m in- 

Statement 2. The proof is similar to that of Statement 1. By Lemma |2.6| , z = 
(-N/3 + X)H with < N < 3. To rule out N = 3 by contradiction, assume z — 
(—1+X)H with A < 0. By Lemmas |2.5| and 2.6, z has a positive sibling u < H and two 
children 2j = (—2/3 + X')H and Z2 = (—1/3 + X")H. Let r be the parent of z and u. 
Then C(T r ) = \r\ + \z\ + C{T Zl ) + C{T Z2 ) + C(T u ). We swap T 22 with T„. Let z' be the 
parent of z% and m. Now r is the parent of z' and u. Let be the new subtree rooted at 
r after the swapping. Since r is the same, C(T/) = |r| + |z'| + C(T zl ) + C(T Z2 ) + C(T n ). 
If u > l^l, then \z'\ = u — \z\\ < (1/3 — X')H < \z\; otherwise, u < \zi\ and thus 
\z'\ = \zi\ - u < \zi\ < \z\. So C{Tl) < C(T r ), contradicting the optimality of T min . □ 

The following lemma supplements Lemma |2.7| ([l|) . 

Lemma 2.8. Let z be a node in T m j n . If z — (1/3 + X)H, then z is a leaf. 
Proof. Assume to the contrary that z — (1/3 + X)H is not a leaf. By Lemmas 

(2/3 + Ai)ff and z 2 = {-l/3 + X 2 )H. By Lemmas 
2.7| , gx nas two children z 3 = (1/3 + X 3 )H and Z4 = (1/3 + X^H, contradicting 
"(0) 



2.5 



and p. 7 
and 
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2.1 



If z < 0, then z can only be in 



Lemma 

The following lemma strengthens Lemma 2.7(2 

Lemma 2.9. Let z be an internal node in T m j n . 
the form of XH or (-1/3 + X)H. 

Proof. To prove the lemma by contradiction, by Lemma 2.7, we assume z — 
(—2/3 + A) H . L et z \ and z 2 be the two children of z. Let u be the sibling of z; by 
Lemmas and |2.7| , u — (2/3 + A')_ff or (1/3 + X')H. Let r be the parent of z and 
it. Then C*(T r ) = |r| + |z| + C(T Zl ) + C(T Z2 ) + C*(T„). By Lemmas |J and |^ there 
are two cases based on the values of z± and z 2 with the symmetric cases omitted. 

Case 1: z x = (-1/3 + X X )LI and z 2 = (-1/3 + X 2 )H. Swap T„ with T Z2 . Let z' 
be the new parent of z x and u. Then r is the parent of z' and u. Let T' r be the new 
subtree rooted at r. Then C(T' r ) = \r\ + \z'\ + C(T Zl ) + C(T Z2 ) + C(T U ). Whether 
u = (2/3 + X')H or (1/3 + X')H, we have \z'\ < \z\ and thus C(T;) < C(T r ), which 
contradicts the optimality of T m ; n . 

Case 2: zx = (1/3 + Xi)H and z 2 = —H. There are two subcases based on u. 

Case 2A: u = (2/3 - 
and u. Then \z'\ < \z\. 

Case 2B: u 
and u. By Lemma | 



X')H. We swap T Z1 with T u . Let z' be the new parent of z 2 

X')H. We swap T Z2 with T u . Let z' be the new parent of z\ 
both zi and u are leaves, and thus by Lemma 2.4 , 1z\ +u < H . 
Therefore, \z'\ = Z\ + u < H — z\ — \z\. 

Therefore, in either subcase of Case 2 the swapping results in an addition tree 
over X with smaller cost than T m i n , reaching a contradiction. □ 

LEMMA 2.10. C(T min ) > m(H + h). Moreover, C(T min ) = m(H + h) if and only 
if {A, L) is a positive instance of 3PAR . 

Proof. By Lemmas 2.5, 2.7, 2.8, and 2.9, each a. L S A can only be added to 
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some a,j £ A or to some Z\ = (—1/3 + X\)H. In turn, Z\ can only be the sum of 
—H and some z 2 — (2/3 + A 2 )i/. In turn, z 2 is the sum of some and ai G A. 
Hence, in T m i n , 2 to leaves in A are added in pairs. The sum of each pair is then 
added to a leaf node —H. This sum is then added to a leaf node in A. This sum is a 
type-0 node with value —\X'\H, which can only be added to another type-0 node. Let 
%>,i, tip, 2, flp, 3 be the three leaves in A associated with each — H and added together as 
((op,i+%),2)+(— H))+a Pj 3 in T m ; n . The cost of such a subtree is 2H— (a Pi i+a Pi 2+a Pj 3). 
There are to such subtrees R p . Their total cost is 2mH—^i=i a>i — mH + mh. Hence, 
C(T mln ) > mH + mh. 

If (A, L) is not a positive instance of 3PAR, then for any T mm , there is some 
subtree R p with a Pi i + a Pi 2 + a P ,3 L. Then, the value of the root Ti of R p is 
a P ,i +Op,2 + a<p,3 — H 7^ —h. Since r-i is a type-0 node, it can only be added to a type-0 
node. No matter how the m root values and the m leaves h are added, some node 
resulting from adding these 2m numbers is nonzero. Hence, C(T m j n ) > mH + mh. 

If (A, L) is a positive instance of 3PAR, let {a Pl i, a p ,2, Q P ,3} with 1 < p < to 
form a 3-set partition of ^4; i.e., A is the union of these to 3-sets and for each p, 
a P ,i + %>,2 + a P ,3 = Then each 3-set can be added to one — H and one h as 
((( a p,i + a p,2) + + <2p,3) + h, resulting in a node of value zero and contributing 

no extra cost. Hence, C(T m j n ) = mH + mh. This completes the proof. □ 

Theorem 2.11. It is NP-hard to compute an optimal addition tree over a multiset 
that contains both positive and negative numbers. 

Proof. By Lemma |2.2|, it suffices to construct a reduction / from 3PAR to AT. Let 



f(B,K) — (X,mH + mh), which is polynomial-time computable. By Lemma 2.1C, 



(X, mH + mh) is a positive ins tance of AT if and only if (A, L) is a positive instance 



of 3PAR. Then, by Lemma 2.3, / is a desired reduction. □ 



3. Approximation algorithms. In light of Theorem |2.1l| , for X with both 
positive and negative numbers, no polynomial-time algorithm can find a T m i n unless 
P = NP ||. This motivates the consideration of approximation algorithms. 

3.1. Linear-time approximation for general X. This section assumes that 
X contains at least one positive number and one negative number. We give an ap- 
proximation algorithm whose worst-case error is at most 2(|~log(n — 1)] + 1).E£. If X 
is sorted, this algorithm takes only 0(n) time. 

In an addition tree, a leaf is critical if its sibling is a leaf with the opposite sign. 
Note that if two leaves are siblings, then one is critical if and only if the other is 
critical. Hence, an addition tree has an even number of critical leaves. 

Lemma 3.1. Let T be an addition tree over X. Let yi, ■ ■ ■ ,1/2k be its critical 
leaves, where yn~\ and yn are siblings. Let z%, ■ ■ ■ , z n -2k be the noncritical leaves. 
Let n = J2 h i=i \V2i-i + V2i\, and A = Y^Z? \z S \. Then C(T) > (n + A)/2. 

Proof. Let x be a leaf in T. There are two cases. 

Case 1: x is some critical leaf j/2i-i or yn. Let be the parent of y-u-i and y%i 
in T for 1 < i < k. Then \n\ = \y-u-i + 2/2i|- 

Case 2: x is some noncritical leaf Zj. Let Wj be the sibling of Zj in T. Let qj be 
the parent of Zj and Wj. There are three subcases. 

Case 2A: Wj is also a leaf. Since Zj is noncritical, Wj has the same sign as Zj and 
is also a noncritical leaf. Thus, \qj\ = \zj\ + \uij\. 



Case 2B: Wj is an internal node with the same sign as Zj. Then \qj \ > 



Case 2C. Wj is an internal node with the opposite sign to Zj. If \vjj\ > \zj\, 
then \qj\ + \wj\ > \zj\; if \uij\ < \zj\, then \qj\ + \uij\ — \zj\. So, we always have 



Minimization of Numerical Summation Errors 



7 



kj\ + \Wj\ > \Zj\. 

Observe that 



c(T)>f2\n\ + l[ E E 1*1+ E i*i 

i=1 \z, in Case 2A / z, in Case 2B zj in Case 2C 



C(T) > E 1^1- 
zj in Case 2C 

Simplifying the sum of these two inequalities based on the case analysis, we have 
2C(T) > II + A as desired. □ 



In view of Lemma 3.1, we desire to minimize II + A over all possible T. Given 
Xt, Xt> £ X with t ^ t' , (xt, Xf) is a critical pair if xt and x# have the opposite signs. 
A critical matching R of X is a set {(xt 2i _ 1 , ^t 2i ) : i = 1, ...,&} of critical pairs where 
the indices tj are all distinct. For simplicity, let y,j = x* . . Let II = + J/2i| 

and A = v /„ . t \z\. If II + A is the minimum over all critical matchings of 
X , then R is called a minimum critical matching of X. Such an R can be computed as 
follows. Assume that X consists of I positive numbers a% < • • • < at and m negative 
numbers —61 > • • • > — b m . 

Algorithm 1. 

1. If I = m, let R = {(a,, -h) : i = 1, . . . ,£}. 

2. If I < m, let R = {(a i; -b i+m -i) :i = l,...,£}. 

3. If I > m, let R = {(a i+t - m , -h) : i = 1, . . . , to}. 

LEMMA 3.2. //X is sorted, then Algorithm |J computes a minimum critical 
matching R of X in O(n) time. 

Proof. By case analysis, if < aj and by < bj>, then |aj — b^ \ + \aj — by \ < 
I ttj — fej' I + |oj — 6-t'|. Thus, if I = m, then pairing aj with — 6^ returns the minimum 
II + A. For the case I < m, let e be an infinitesimally small positive number. Let X' 
be X with additional m — I copies of e. Then, Yli=i \ a i ~ bi+ m -e\ + Y^Ji=\ \ e — = 
(£ — m)e + II + A is the minimum over all possible critical matchings of X' . Thus, 
II + A is the minimum over all possible critical matching of X. The case I > m is 
symmetric to the case I < to. Since X is sorted, the running time of Algorithm [l] is 
O(n). □ 

We now present an approximation algorithm to compute the summation over X. 
Algorithm 2. 

1. Use Algorithm [l] to find a minimum critical matching R of X. The numbers 
Xi in the pairs of R are the critical leaves in our addition tree over X and 
those not in the critical pairs are the noncritical leaves. 

2. Add each critical pair of R separately. 

3. Construct a balanced addition tree over the resulting sums of Step 2 and the 
noncritical leaves. 

Theorem 3.3. LetT be the addition tree over X constructed by Algorithm^. If X 
is sorted, then T can be obtained inO(n) time andE(T) < 2([log(n— 1)] +l)£ , (T m j n ). 

Proof. Steps 2 and 3 of Algorithm both take 0(n) time. By Lemma |3.2| , Step 1 
also takes 0{n) time and thus Algorithm || takes 0(n) time. As for the error analysis, 
let T be the addition tree constructed at Step 3. Then C{T) = C(T') + LL Let h be 
the number of levels of T . Since T is a balanced tree, C(T') < (h - 1)(II + A) and 
thus C(T) < h(H + A). By assumption, X has at least two numbers with the opposite 
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signs. So there are at most n — 1 numbers to be added pairwise at Step 3. Thus, 



h < [log(n — 1)] + 1. Next, by Lemma 3.1, since R is a minimum critical matching of 



X, we have C(T min ) > (n + A)/2. In summary, E(T) < 2(["log(n - 1)] + l)£(T min ). 
□ 

3.2. Improved approximation for single-sign X. This section assumes that 
all Xi are positive; the symmetric case where all Xi are negative can be handled 
similarly. 

Let T be an addition tree over X. Observe that C(T) = Yl7=i x idi, where di is 
the number of edges on the path from the root to the leaf Xi in T, Hence, finding an 
optimal addition tree over X is equivalent to constructing a Huffman tree to encode 
n characters with frequencies X\, . . . , x n into binary strings 

Fact 3.1. If X is unsorted (respectively, sorted), then a T m i n over X can be 
constructed in 0(n log n) (respectively, 0(n)) time. 

Proof. If X is unsorted (respectively, sorted), then a Huffman tree over X can be 
constructed in 0(n log n) pj] (respectively, 0(n) ||) time. □ 

For the case where X is unsorted, many applications require faster running time 
than 0(n\ogn). Previously, the best 0(n)-time approximation algorithm used a 
balanced addition tree and thus had a worst-case error at most [logn]£'*. Here we 
provide an 0(n)-time approximation algorithm to compute the sum over X with a 
worst-case error at most [log log n] E* . More generally, given an integer parameter 
t > 0, we wish to hnd an addition tree T over X such that C(T) < C(T m ; n ) + 1 ■ \S n \. 

Algorithm 3. 

1. Let m = \n/2 t ~\. Partition X into m disjoint sets Z\, . . . , Z m such that each 
Zi has exactly 2* numbers, except possibly Z m , which may have less than 2' 
numbers. 

2. For each Zi, let Zi = max{x : x € Z{\. Let M — {zi : 1 < i < m}. 

3. For each Zi, construct a balanced addition tree Ti over Zi. 

4. Construct a Huffman tree H over M. 

5. Construct the final addition tree T over X from H by replacing %i with T.- L . 
Theorem 3.4. Assume that all positive. For any integer t > 0, 

Algorithm^ computes an addition tree T over X in 0(n + rologm) time with C(T) < 
C(T min )+t\S n \, where m = [n/2*]. Since \S n \ < C(T mia ), E(T) < (1 + t)E(T min ). 

Proof. For an addition tree L and a node y in L, the depth of y in L, denoted 
by di(j/), is the number of edges on the path from the root of L to y. Since H is a 
Huffman tree over M C X and every T m ; n is a Huffman tree over X, there exists some 
T m in such that for each Zj, its depth in T m i n is at least its depth in H. Furthermore, 
in Tmin, the depth of each y 6 Zi is at least that of Zi. Therefore, 



EE 



Xj ■ d H (zi) < C(T n _ 



Also note that for Xj S Zi, dr{xj) — dii(zi) < log 2* = t. Hence, 
C(T) =J2 x i- dT ^) 



XiEX 



= E E X i ' dH ^) + E E X J" (^fe) ~ d H(Zr)) 

< C(T mln ) + 1 ^ 
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In summary, C(T) < C(T min ) + tS n . Since Step 4 takes 0(m log m) time and the 
others take 0(n) time, the total running time of Algorithm || is as stated. □ 

Corollary 3.5. Assume that n > 4 and all positive. Then, 

setting t = [log((log n) — 1)J, Algorithm [| /mds an addition tree T over X in 0(n) 
time with E(T) < [log log nl^^ 



Proof. Follows from Theorem 3.4. □ 
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